# Native Multimodal Pretraining
Internvl3 38B Instruct GGUF
Apache-2.0
InternVL3-38B-Instruct is an advanced Multimodal Large Language Model (MLLM) that demonstrates exceptional overall performance, with strong multimodal perception and reasoning capabilities.
Image-to-Text
Transformers

I
unsloth
1,236
2
Internvl3 8B GGUF
Apache-2.0
InternVL3 is an advanced multimodal large language model series, demonstrating exceptional overall performance with robust multimodal perception and reasoning capabilities.
Image-to-Text
Transformers

I
unsloth
4,810
3
Internvl3 9B AWQ
MIT
InternVL3-9B is a multimodal large language model from the InternVL3 series, featuring exceptional multimodal perception and reasoning capabilities. It supports various application scenarios such as tool usage, GUI agents, industrial image analysis, and 3D visual perception.
Text-to-Image
Transformers Other

I
OpenGVLab
214
1
Internvl3 8B AWQ
Other
InternVL3-8B is an advanced multimodal large language model developed by OpenGVLab, featuring powerful multimodal perception and reasoning capabilities, supporting tool calling, GUI agents, industrial image analysis, 3D visual perception, and other emerging fields.
Image-to-Text
Transformers Other

I
OpenGVLab
1,441
3
Internvl3 2B AWQ
Other
InternVL3-2B is an advanced Multimodal Large Language Model (MLLM) developed by OpenGVLab, featuring exceptional multimodal perception and reasoning capabilities, supporting tool usage, GUI agents, industrial image analysis, 3D visual perception, and more.

I
OpenGVLab
677
1
Internvl3 1B AWQ
Other
InternVL3-1B is a multimodal large language model in the InternVL3 series, featuring exceptional multimodal perception and reasoning capabilities.
Text-to-Image
Transformers Other

I
OpenGVLab
303
1
Internvl3 2B Pretrained
Apache-2.0
InternVL3-2B is an advanced multimodal large language model developed by OpenGVLab, featuring robust visual-language understanding and reasoning capabilities, supporting various multimodal tasks.
Text-to-Image
Transformers Other

I
OpenGVLab
61
1
Internvl3 9B Instruct
MIT
InternVL3-9B-Instruct is the supervised fine-tuned version of the InternVL3 series, featuring powerful multimodal perception and reasoning capabilities, supporting various modalities such as images, text, and videos.
Image-to-Text
Transformers Other

I
OpenGVLab
220
2
Internvl3 8B Instruct
Other
InternVL3-8B-Instruct is an advanced Multimodal Large Language Model (MLLM) that demonstrates exceptional multimodal perception and reasoning capabilities, supporting various functionalities such as tool usage, GUI agents, industrial image analysis, and 3D visual perception.
Image-to-Text
Transformers Other

I
OpenGVLab
885
2
Internvl3 2B Instruct
Apache-2.0
InternVL3-2B-Instruct is a supervised fine-tuned version based on InternVL3-2B, undergoing native multimodal pretraining and SFT processing, equipped with powerful multimodal perception and reasoning capabilities.
Text-to-Image
Transformers Other

I
OpenGVLab
1,345
4
Internvl3 1B Instruct
Apache-2.0
InternVL3-1B-Instruct is the supervised fine-tuned version of the InternVL3 series, based on native multimodal pretraining, with exceptional multimodal perception and reasoning capabilities.
Image-to-Text
Transformers Other

I
OpenGVLab
705
5
Internvl3 78B Instruct
Other
InternVL3-78B-Instruct is an advanced multimodal large language model developed by OpenGVLab, demonstrating exceptional multimodal perception and reasoning capabilities, supporting various tasks such as tool usage, GUI agents, industrial image analysis, and 3D visual perception.
Image-to-Text
Transformers Other

I
OpenGVLab
345
5
Internvl3 1B
Other
InternVL3-1B is a 1B-parameter multimodal large language model in the InternVL3 series, integrating the InternViT visual encoder and Qwen2.5 language model, with exceptional multimodal perception and reasoning capabilities.

I
FriendliAI
71
1
Featured Recommended AI Models